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Pre'Hction  Intervals  ’>'ith  the 

n 

Dirichlet  Prior 
by 

Gregory  Ca’^bell  and  ’ Vies  Hollander 

Let  and  ’'e  consecutive  sajnnles  fron  a lirichlet 

process  on  (>?,-)  (the  real  line  ’’  with  the  I!orel  a-fiold  ")  with  naraneter 
a.  Typically,  prediction  intervals  employ  the  previous  ohsenMtions 
in  order  to  predict  a specified  function  of  the  future  sample  Yj,...,Y»,. 

Here  one-  and  two-sided  prediction  intervals  for  at  least  k of  N future  obser- 
vations are  develope^l  for  the  situation  in  which,  in  addition  to  the  previous 
sample,  there  is  prior  information  available.  The  information  is  specified 
via  the  parameter  o of  the  Dirichlet  process. 


Kzy  Prediction  intervals:  Dirichlet  process;  Bayesian  nonparametric 

methods;  Coverage  property. 


1 . li^lTRODUCriON 

Let  X, , . . . ,X  be  a randosn  sample  of  size  n from  a distribution  function 
F.  Let  Yj^,...,Yj^  be  a second  random  sample  of  size  N from  the  same  distri- 
bution function  F and  let  g(Yj,...,Y^)  be  some  function  of  these  random  vari- 
ables. Then,  if  L, (X, , . . . ,X  ) and  L^(X, ,...,X  ) are  statistics  based  on  the 
initial  sample, [Lj,L2l  is  said  to  be  a lOOy  percent  prediction  interval  for 
> • • • 

Pr{L^(X^,...,X^)  < g(Y^,...,Y,p  < L2(X^,...,X^)}  = v. 

* 

Research  sponsored  by  the  Air  Foce  Office  of  Scientific  Research,  AFSC,  USAF, 
under  Grants  AFOSR-74-2581B  and  AFOSR-76-3109.  The  United  States  Government 
is  authorized  to  reproduce  and  distribute  reprints  for  governmental  purposes. 
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Parametric  nrediction  intervals  have  been  considered  by  i.iany  authors, 
including  Proschan  (1953),  Che»-r  (1966),  ifeihn  (1969,  1370^, 

' 370’.,  1977).  I’ilks  (1942,  1962)  introduced  nonnaranetric  prediction  inter- 

vals t^or  the  case  in  v/hich  F is  an  unknovm  continuous  distribution  -Function 
and  one  is  interested  in  intervals  to  contain  at  least  k of  N fiJture  obser- 
vations. Flipner  and  ’-’olfe  (1976)  have  approached  nonoarametric  prediction 
intervals  via  a sample  analogue  to  the  probability  integral  transformation 
and  to  a coverage  property  (see  Section  ^).  In  narticular,  they  have  revie’-ied 
the  results  of  '"ilks,  developed  additional  prediction  intervals,  and  gener- 
alized prediction  intervals  to  the  case  of  an  unknown  discontinuous  distri- 
bution function.  Bayesian  approach  to  prediction  intervi-als  is  nresented 
in  Guttman  (1970) . 

This  paper  combines  nonparametric  and  Bayesian  approaches  to  develop 
intervals  which  allow  the  use  of  both  prior  information  and  the  data  of  the 
initial  sample,  without  requiring  strong  parametric  assumptions.  Our  Bayesian 
nonparametric  prediction  intervals  are  derived  using  Ferguson's  (1973)  Oirich- 
let  process  prior  on  the  space  of  distribution  functions.  The  Dirichlet  pro- 
cess is  introduced  in  Section  2.  Section  3 presents  the  construction  of  one- 
sided Bayesian  nonparametric  prediction  intervals  for  at  least  k of  N future 
observations.  The  possibility  of  a coverage  property  for  a samnle  from  a 
Dirichlet  process  is  investigated  in  Section  4.  Section  4 also  contains  some 
useful  results  concerning  the  distribution  of  the  order  statistics  from  a 
Dirichlet  sample.  The  two-sided  prediction  interval  problem  ^vith  prior  infor- 
mation in  the  form  of  a dirichlet  process  prior  is  sol-ved  in  Section  5.  The 
final  section  contains  an  example  which  illustrates  the  procedure  of  constructing 
Bayesian  nonparametric  prediction  intervals,  and  disaisses  the  implementation 
of  such  prediction  intervals. 
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2.  PRHLPqmPIES 

Let  be  independent  gamma  randan  variables  with  shape  parameters 

k k 

a.  i 0 and  scale  parameter  1,  i»l,...,k.  Define  Y.  = Z./  I Z..  If  ? a.  > 0, 

1 1 j-1  J i.l  1 

then  (Yj^, . . . ,Yj^)  is  said  to  have  a Vi/uichZzt  diitfubujUjon  with  parameter 

(oj ojj).  If  all  the  are  strictly  positive,  the  distribution  of 

^l’*’**^k-l^  is  absolutely  continuous  with  d«isity 

r(o,  +...+  ap[k-l  Oi*ll  k-1  ou'l 
f(yi» • • . »yjj)  ■ r(aj)...rcoj^3  ^i  • • ’^k-i^ » 

k-1 

where  S denotes  the  simplex  y . ^ 0 for  i-l,...,k-l  and  7 y.  s 1.  Tlie  Dirichlet 

* • 1 

1-1 

distribution  is  also  called  the  multi-beta,  in  that  for  k-2,  it  reduces  to  the 
beta  distribution. 

The  following  expression  for  the  rj^, ...  ,rj^— moment  of  the  distribution  of 

(Yj Yj^)»  for  t s k and  r^  a nonnegative  integer,  will  be  useful  in  the 

sequel: 


_r(aj+rp...r(a^*rpr(a) 

^ T(E.p~ rCrprM 


(2.1) 


where  e » J a.  and  r - 7 r- . (Fbr 
i-1  ^ j-1  ^ 


a nroof  of  this  result  and  a more  connlete 


treatment  of  the  Dirichlet  distribution  see  l^^ilks  (1962) . For  further  back- 
ground on  the  Dirichlet  distribution  and  its  generalizations,  see,  •^or  example, 

rvT 

Connor  and  Tfosimann  (1969)  and  Hood  (1965) .)  Let  y denote  the  ascending 

factorial  y(y+l)...(y+k-l)  inth  y^^^=l.  Then  the  right-hand  side  of  (2.1) 

Cr,l  Cr  I r , 
can  be  rewritten  as  • 

The  Dirichlet  process  on  the  real  line  can  now  be  defined.  Let  a be  a 
nonnegative  measure  on  the  real  line  R ivith  Borel  o -field  R.  Then  P is  a 
DifUchttX  pftaceM  on  (R,R)  with  parameter  o if,  for  every  m-1,2,...,  and  every 
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measurable  partition  of  ”,  (”(Rj) , . . . bas  a Ririchlet  dis- 
tribution 1'ri.th  parameter  . ,a(R^)) . T’ds  nrocess  "ives  rise  to  a 

probability  on  the  set  of  distribution  functions,  as  shor^ai  in  t^e  lan'V’ar’c 

paner  of  Fer.puson  (1973).  ”v  a sample  from  the  process,  it  ’-nil  be  understood 

that  a distribution  function  F is  chosen  bv  this  nrohabilitv  and  then  a random 

sarrole  obtained  from  F.  (See  Fernnson  (1973)  and  ”erk  and  Savare  (1977)  for  a 

laore  rigorous  mathematical  treatment.)  The  tractahility  of  Fer<ruson's  an- 

nroach  lies  in  part  in  rollo\-dnp  result  (Theorem  1 of  Ferauson,  1973) . The 

posterior  distribution  of  the  Dirichlet  process  F with  narameter  a,  riven  a 

samnle  Xj^,...,X^  from  P,  is  ac;ain  a Ririchlet  nrocess  with  as  a nara»^eter  t^e 

r 

undated  measure  a + ^ 6y  , v'here  6 is  the  measure  v^iich  concentrates  all 
i=l  1 ^ 

its  mass  of  one  at  the  noint  z. 

For  the  purposes  of  this  naner,  f is  taken  to  be  a ^tandom  distribution 

function  from  Ferguson's  Dirichlet  process  nrior.  diven  the  first  samnle 

^l*’"’’^n  ^ random  samnle  from  F.  The  secon  - samnle  Yj^,...,Yj^  is  then  a 

sample  frcu  the  conditional  Dirichlet  nrocess,  riven  X, ,... ,X  . Dpe 

1 n 

inshes  to  predict  a snecified  function  of  the  second  samnle.  In  particular, 
several  prediction  intervals  are  obtained  to  contain  at  least  q of  the  Y 
future  observations. 

3.  (T'E-SIDFU  PREDICTIOM  TTHTiRms  'DTI'  TT.  DIRiaiLET  PPIOR 

In  this  section  lODy  percent  nrediction  intervals  of  the  form  (x,“)  are 
found  for  at  least  q of  N future  obsen^ations.  Let 

R(x)  = Pr(x  < at  least  q of  the  N Y’s  < »}.  (3.1) 

lYote  that  -R(x)  is  decreasing  in  x.  The  nroblen  is  to  find  x^  such  that 
"(Xq)  * Y,  ^or  then  (x^,“)  is  the  desired  interval. 
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Unlike  the  nonparametric  prediction  intervals  of  ’*'ilks  (1942,  1962) 
and  Flipner  and  Vfolfe  (1976),  it  is  possible,  usinp  the  Dirichlet  nrocess 
prior,  to  form  nrediction  intervals  for  the  case  of  no  initial  sample  of 
X's  (i.e.,  n«0) . Call  t^is  nroblem  the  ”no  data”  nroblen.  This  nroblem 
is  first  solved  and  tlien  extended  in  a natural  wav  to  obtain  the  solution 
of  the  "data”  nroblem  (n  > 0). 

For  fixed  x,  let  fu^d  denote  the  random  variables  for  the 

number  of  Y’s  that  are  less  than,  equal  to,  and.  (greater  than  x,  resnectively. 
In  the  "no  data”  problem,  is  merely  a samnle  frcFi  a Uirichlet 

process  ’vith  parameter  a.  For  notational  convenience,  the  subscrint  x for 
I,J,  and  IC  is  suppressed. 

Theorem  1:  For  Yj^,...,Yj^  a sample  from  a Dirichlet  process  vdth  parameter  a. 


Pr{(I,J,K)  = (i,j,k)}  « 


:i^,)  a(-.x)'^'a((x»':>'a(x,»)^"Va(«’)^‘^ 


P/too^:  atribution  function  F given,  a multinomial  argument  vields 

Pr{(I,J,K)  = (i,j,k)|F)  = (.  ^ J F(x')^'-r(x)-F(x‘)>Vl-F(x)l’^.(3.3) 

Integration  of  both  sides  of  (3.3)  with  respect  to  the  nrobability  0^  on  the 
set  of  distribution  function  gives 

Pr{(I,J,K)  * (i,i,h)}  * (.^^'^^)  / F(x‘)^CF(x)-F(x')]^n-F(x)1^dO^(F). 

Tien,  by  definition  of  the  Dirichlet  process,  (F(x  ),F(x)-F(x  ),1-F(x))  has  a 
Dirichlet  distribution.  Application  of  the  i,j,k—  moment  of  this  Dirichlet 


distribution  yields  the  right-hand-side  of  (3.2),  completing  the  proof . | | 

The  random  variables  (Ij^,...,I^)  are  said  to  have  a ^.OUchte^  compound 

rmUZinomiaZ  distribution  (see  Johnson  and  yjotz,  1969,  n.  309)  with  parameters 

k 

N,aj,...,oj^  if,  for  non-negative  integers  ij^,...,ij^  such  that  i^*N, 


I 

( 


m: 


" ^ 


[ I i.i"  j'l  i 

i=l  J 


The  Dirichlet  compound  multincmiial  results  (as  the  name  indicates)  by  placing 

a Dirichlet  distribution  on  the  parameters  of  a multinomial  distribution. 

It  is  clear  that  the  distribution  of  (T,J,K),  f»iven  by  (3.2),  is  Dirichlet 

cotmxjimd  multinomial  with  parameters  ’'’,a(-“>,x)  ,a((x})  ,a(x,®)  . 

Tjie  one-sided  prediction  interval  problem  is  find  such  that  R(x^)  = y. 

Tliis  equation  can  be  re^vritten  as 
N 

\ T>r {exactly  k of  the  M future  Y observations  > Xr,}  = y. 
k=q 


Nov/,  for  the  'no  data  ’ problem, 

Pr(exactly  k of  M future  observations  > x)  = P{K=k). 

Since  the  distribution  of  (I,J,K)  is  Dirichlet  compound  multinomial,  the 
distribution  of  K !ias  what  is  called  a beta  compound  binomial  distribution 
or  a Pdlya-Eggenberger  distribution  (see  Johnson  and  Kotz,  1969,  p.  229). 
It  follows  tliat 


Pr{X=k} 


a(-”,x]'^  ^^a(x,' 


Tlierefore,  the  solution  is  sought  for  the  following  enuation  in  x: 


'J 


k=q  ^ 


(3.4) 


The  monotonicity  of  R(x)  from  the  definition  ensures  that,  for  0 < y < 1, 
there  is  either  a solution  x^  to  equation  (3.4)  or  there  exists  an  Xj  such 
that  R(Xj^)  < y s R(xj) . If  the  Dirichlet  parameter  a is  a nonatomic  measure, 
so  that  a(-“,t)  is  a continuous  function  in  t,  then  the  left -hand -side  of 
(3.4)  is  continuous.  Further,  since  P.(x)  ranges  from  1 to  0,  in  such  a case 
a solution  exists  (it  mav  not  be  unique).  In  the  second  case,  if  P(Xj)  < y s 


i 


»l 
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R(Xj^),  thfc  interral  CXj,»)  is  a nredictior.  interval  for  at  least  o 

of  N future  observations  with  prediction  coefficient  at  least  y 

The  solution  to  the  prediction  interval  "'lata”  problem  is  now  considered. 

ThuS:-  sun>x)se  that  an  initial  samnle  is  observed  from  a nirichlet 

process.  The  development  for  the  data  problem  is  iirmediate  in  that  the 

Dirichlet  process  with  parameter  a is  merely  replaced  by  the  Oirichlet  pro- 

n 

cess  with  updated  parameter  a’  = o + and  one  proceeds  as  in  the  'no 

i=l  ^i 

data”  problem.  Thus,  (T,J,K)  given  (Xj^,...,X^)  has  a Dirichlet  compound 
multinomial  distribution  with  parameters  N,o' (-®,x) ,a' ({x}) ,a' (x,“) . The 
prediction  interval  is  obtained  upon  the  solution  o^ 


V A ,,  ,rD-M  .rkl, 

I (u)  o (-«,x]  a'(x,«)  /a'(T?) 


[Ml 


= Y. 


(3.5) 


Here,  a'  is  not  nonatomic  so  either  a solution  x^  exists  or  there  exists  an 
Xj^  such  that  fx^,®)  is  a prediction  interval  for  at  least  q of  N future 
observations  vfith  prediction  coefficient  at  least  y- 

There  are  two  special  cases  of  note.  V/hen  q=*J,  one  obtains  the  one-sided 
upper  prediction  interval  for  all  N future  observations;  when  q=l,  the  interval 
is  the  one-sided  upper  prediction  interval  for  the  largest  of  N future  obser- 
vations . 


4.  IMVESTIGATION  OF  HE  COVFRAGn  PROPERTY  =OR  A DIRIGILET  SLA’IPLE 

The  coverage  property  for  a continuous  distribution  function  Fq  with 
Yi,...,Yj^  a random  samnle  from  is  as  follov«;; 

Coverage  PfiopeAtij:  If  Y^^^^  s...<  Y^^^  denote  the  order  statistics  of  the 

sample  Yj,...,Yj^  from  F^,  then,  for  integers  p and  q such  that  0 s p < n < N+1, 
the  distribution  of  the  same  distribution  as 

where,  by  convention  Fp(Y^^j)»0  and 
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Fligner  and  Wolfe  (1976)  have  extended  the  coverage  property  from  the 
case  of  a continuous  distribution  function  to  that  of  the  empirical  distri- 
bution function  fixm  the  initial  sample  also  from  F^.  In 

particular,  they  prove  that  the  distribution  of  the 

same  distribution  as  F (Y, 

n'-  (q-p)" 

A question  of  interest  is  whether  the  coverage  property  holds  for 

a sample  fran  a Dirichlet  process  with  parameter  a.  In  particular, 
is  it  true  that  Va(R) }-{«(-«, Y^^^  Va(P) ) has  the  same  distribution 

as  a(-“,Y^^_p^ ]/ct(R)?  If  the  coverage  property  were  to  hold,  it  would  aid 
in  constructing  two-sided  prediction  inten^als  directly  from  one-sided  inter- 
vals in  that  if  CY(q.p)»“)  ® one-sided  lOHy  percent  prediction  interval, 

then  would  also  be  a lOOy  percent  prediction  interval  for  fixed 

integers  p and  q with  0 < p < q < N+1 . In  that  event,  one  could  employ  the 
techniques  derived  in  the  preceding  section. 

However,  the  coverage  property  does  not  hold  for  samples  from  a Dirichlet 
process.  It  sufficies  to  demonstrate  this  for  the  case  N = 2,  p = 1,  and 
q = 2 by  comparison  of  the  mean  of  a(-«>,Y^2)  1 = '*0^(1)  »^(2)  ^ 

the  mean  of  .C-.Ypp.  If  the  coverage  property  were  true,  then,  in  par- 
ticular, Ea  = >^(-2)  ^ equivalently, 

2Ea(-«,Y^j^l  = Ea(-»,Y^2)^- 

Theorem  2 below,  which  gives  the  distribution  of  the  r^^  order  statistic  of 
a sample  of  size  from  a Dirichlet  process,  will  be  used  to  show  that 
equality (4.1)  dou  not  hold.  Since  the  Dirichlet  process  places  all  its  mass 
on  discrete  distribution  functions  (see,  for  example,  Ferguson  (1973), 
Blackwell  (1973),  and  Berk  and  Savage  (1977)),  there  can  be  ties  in  the 
samples  from  Dirichlet  processes.  Nonetheless,  one  can  order  the  random 


variables  from  a sample  of  size  r a Dirichlct  r'rocoss  anc’  rlorivo  the 


'listribution  of  t^c  orcle-^  statistics. 


Theorem  2:  For  1 s r s n,  the  distribution  of  the  r^  order  statistic 

of  a sample  of  size  n from  a Dirichlet  process  with  parameter  a is  given  by 

(4.2) 


^(X)  = [ (J)  a(-,x]’^^''a(x,-)’^"'^Va(R)''''' 
i*r 


P^oiJ;  Suppose  F is  a known  distribution  function  Tvith 

the  random  sample  from  F.  Then  the  distribution  of  order 

statistic  is: 


Pr{X,  . s x|F}  = ? (")  F(x)Vi-f(x)i"‘\  (4.3) 

i«r 

If,  in  fact,  F is  a random  distribution  function  from  a Dirichlet  nrocess, 
then  by  definition,  for  x fixed,  F(x)  has  a beta  distribution  with  parameters 
a(-»,x]  and  o(x,“).  Then  integrating  both  sides  of  (4.3)  over  F,  one  obtains 

F^(x)  = Pt{X,  , s X)  = ^ (^)  / F(x)^Cl-F(x)]"‘^dQ  (E) 

^ i*r 

= I (J)o(-“.x]'^^^a(x,-)'^"‘^Va(P)''’-\ 

i-r  ^ 

The  final  line  above  follows  by  the  moments  of  the  beta  (Dirichlet) 
distribution. | | 

It  is  a simple  matter  to  also  derive  the  joint  distribution  of  the 
r^^  and  s^^  order  statistics  (r  < s) . 

Theorem  3;  If  Xj,...,X^  is  a sample  of  size  n from  a Dirichlet  process 
with  parameter  a,  the  joint  distribution  of  the  r order  statistic  X^^^ 
and  the  s^^  order  statistic  X^^j , for  1 s r < s s n,  is  given  by 
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n n-T 

Fy  s(x,y)  = I I 

’ i=r  j=max(0,s-i) 


•a(y,®)^"  ^ ^Va(R)*^"^.  (x<y) 


(4.4) 


Pa.00^:  Given  the  distribution  function  F,  the  joint  distribution  of 

and  is,  for  x < y: 


PrCX^r)  ^ ^s)  ^ 


(4.S) 


n n-i  ^ \ i i 

I I ti,j,n-i-jj  F(x)^[F(y)-F(x)]^ri-F(y)] 

i=r  j=max(0,s-i) 

Integrating  both  sides  of  (4.5)  with  respect  to  F,  using  the  definiticm 
of  the  Dirichlet  process  for  the  partition  (-«,x], (x,y], (y,») , and  employing 
the  moments  of  the-irichlet  distribution  completes  the  proof.  || 

By  an  application  of  Theorem  2,  the  distributions,  of  the  first  and 
second  order  statistics,  for  the  case  N=2,  are 

F^(x)  = [{2a(-»,x1a(x,»)}  + ot(-«,x/^^]/a(R)'^^\ 

F2(x)  = a(-«,x]'^^Va(R)'-^'. 

It  suffices  to  consider  the  special  case  of  a(-“,x3  = x for  x c [0,13  with 
a(C0,13)  = 1 and  a(R-[0,13)  = 0.  Then, 


Eo(-“,X^jj3  = E(X^jj)  = / xdFj(x) 


1 1 

= / (1-F,(x))dx  » / {l-x(l-x)-^x(x+l)}dx  = 5/12. 

0 ^ 0 


In  a similar  fashion. 
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Thus  equation  (4.1)  does  not  hold  for  this  special  case,  Therfore,  the 
coverage  property  is  not  valid  for  a sample  from  a Tlirichlet  process, 

5.  TlvO-SIDED  P''F.HICT10N  INTERVALS  WITH  THE.  DTRICHLET  PRIOR 


The  problem  of  generating  two-sided  lOOy  percent  prediction  inter- 
vals of  the  form  (x,y),  for  x < y,  to  contain  at  least  o of  N future  ob- 
servations from  a Dirichlet  process,  requires  more  notational  development. 
Let  I,  J,  K,  L,  and  (all  dependent  on  x and/or  y i^th  the  notational 
dependences  suppresse<5  be  random  variables  for  the  nimber  of  , , , . , 
that  are  less  than  x,  equal  to  x,  between  x and.  y,  equal  to  y,  and  greater 
than  y,  respectively.  (Note  that  I,  J,  and  K have  been  redefined  and  should 
not  be  confused  ’with  their  use  in  Section  3.) 

Thzofiein  4:  If  Xj,...,X^  is  a sample  from  a Dirichlet  process  P (sav)  vdth 

parameter  a and  Yj^ , . . . ,Yj^  is  a second  sample  from  the  conditional  process 
P given  Xj^, . . . ,X^,  then  for  x and  y with  x < y , 

Pr{(I,J,K,L,i!)  = (i,j,k,e,m)|Xj,...,X^} 


i,  j',k,£,mja*(-,x)'^^‘'a'  ({x})'^  (x,y)'^’"^a'  ({y}) 


.[ml. 

■a  (y,“)  /a  (k)  , 


n 

where  a'  = a + 'l 

1=1  i 


PfLOof:  The  conditional  probability  distribution  of  (I,J,K,L,’0  given 

^l’’“*^n  ^ obtained  by  a multinomial  argument.  Integration  over  F 

and  application  of  the  mean  of  the  Dirichlet  distribution  for  (F(x  ), 

F(x)  - F(x’),  F(y*)  - F(x),  P(y)  - F(y'),  1 - F(y))  yields  (5.1). || 

The  distribution  of  (I,J,K,L,M)  given  Xj,...,Xj^  is  Dirichlet  compound 
multinomial  with  parameters  M,  o'(-»,x),  a'({x}),  o'(x,y)  o'({y}),  a'(y,«). 
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Note  that  if  n • 0 and  x » y so  that  K ■ 0 and  J and  L are  combined. 

Theorem  1 is  obtained  as  a special  case. 

For  X < y,  define 

R(x,y)  * Pr(at  least  q of  the  Y's  are  in  the  interval  (x,y)} 

N 

■ I Pr{ exactly  p of  the  Y's  are  in  the  interval  (x,y)}. 
p-q 

Note  that  for  x fixed,  P.(x,y)  is  increasing  in  y and  that,  for  y fixed 
R(x,y)  is  decreasing  in  x.  The  prediction  interval  problem  is  to  find 
(XQ.yo)  such  that  R(Xp,yp)  = y.  However,  fron  Theorem  4 and  the  fact 
that  the  marginals  of  the  Dirichlet  compound  multinomial  are  beta  compound 
binomial,  K has  a beta  conpound  binomial  with  parameters  N,  a'(x,y), 
a'(R-(x,y)).  Thus, 

N N 

R(x,y)  * I Pr{K-p}=  I & a’(x,y)''P''a'(R-(x,y))'’^-^'’/a'(R)''^\  C5.2) 

p-q  p*q  ^ 

A trial-by-error  solution  to  find  (xQ,yQ)  such  that  R(Xg,yQ)  « y is  one  way 
of  proceeding.  The  solution  (if  it  exists)  need  not  be  unique  and  in  fact 
an  uncountably  infinite  number  of  pairs  is  possible.  Note  that  as  x or  y 
is  shifted,  a'(x,y)  may  change,  so  that  a coirpiter  in  many  cases  is  an  in- 
valuable aid  in  the  deteimination  of  such  prediction  intervals  for  even  small 
values  of  n and  N. 

It  is  clear  that  one  could  easily  construct  prediction  intervals  of  the 
form  r.x,yl,  or  (x,y1  instead  of  (x,y).  For  exairqple,  for  the  interval  Cx,y), 
one  employs  the  fact  that  J + K + L has  a beta  compound  binomial  distribution 
with  parameters  N,  a'[x,y],  o'(R-[x,yi)  and  proceeds  as  above. 

In  the  event  that  a(R)  is  small,  there  may  be  no  solution  to  R(x,y)  - y. 

In  that  case,  one  could  find  x^  and  y^  such  that  R(x^,yj^)  < y ^ RCxjtyj).  Then 
^^I’^l^  is  a prediction  interval  for  at  least  o of  N future  observations  with 
prediction  coefficient  at  least  y. 
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6.  AN  EXAMPLE 


In  this  section,  two-sided  non-Bayesian  nonparametric  and  Bayesian  non- 
parametric  (Dirichlet)  prediction  intervals  for  at  least  q of  N future  obser- 
vations are  illustrated  using  a numerical  exau^le  originally  introduced  by 
Hahn  (1970a) . He  gives  the  following  data,  on  failure  times  (in  months)  of 
a new  type  of  machine,  recorded  for  five  prototypes:  51.4,  49.5,  48.7,  49.3, 
51.6.  To  illustrate  our  procedure  we  suppose  that  there  is  prior  evidence 
(from  fast  experience  relating  to  a similar  machine)  vdiich  suggests  that  the 
underlv  jng  life  distribution  can  be  approximated  by  a normal  distribution  with 
a mean  of  50  and  a standard  deviation  of  1.25.  Thus,  to  apply  the  two-sided 
Bayesian  nonparametric  prediction  interval  introduced  in  Section  5,  we  will 
set  {a(-«,xl/a(R) } = 4({x-50}/l  .25)  where  4'(’)  is  the  standard  normal  omula- 
tive  distribution  function.  We  must  also  specify  a value  for  a(R).  This  speci- 
fication hinges  on  the  degree  of  confidence  or  belief  that  one  invests  in  this 
choice  for  the  measure  a.  For  this  case,  suppose  we  set  a(R)  = 5.  Roughly 
speaking,  this  corresponds  to  a prior  sample  size  of  5 observations.  Since 
n also  equals  5 here,  the  prior  and  the  initial  sample  of  size  5 are  equally 
weighted  in  their  contribution  to  the  prediction  interval.  Rather  than  to 
construct  the  different  prediction  intervals  (which  may  not  be  unique)  for 
a fixed  prediction  coefficient,  for  simplification  we  let  the  prediction 


intervals  ^^(1)’^(5)^  chosen  and  the  prediction  coefficients 

computed.  (Note  that  any  order  statistics  could  have  been  chosen  for  the  sake 


of  comparison  of  Dirichlet  and  nonparametric  prediction  intervals,  but  that 
unlike  the  Dirichlet  intervals,  the  nonparametric  ones  demand  that  only  order 
statistics  of  the  initial  sample  serve  as  endpoints.) 
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Consider  the  two-sided  prediction  interval  for  a single  future  obser- 
vation (N  - 1) . The  non-Bayesian  nonparametric  prediction  coefficient  for 
the  interval  “ (48.7,51.6)  based  on  the  n = 5 initial  observations 

is  as  follows  [see  V.'ilks  (1942)  or  Danziger  and  Davis  (1964)  for  details]: 
Pr{exactly  Nq  of  N future  observations  fall  in  (^(^j 

(6.1) 

« n(n-l) (N-Nq*1)M: (N^+n-Z) :/{Nq; (N+n) ! }. 

Substituting  into  (6.1)  with  n * 5 and  N = = 1 yields  the  value  2/3  for 

the  prediction  coefficient.  Contrast  this  with  the  Dirichlet  prediction 
coefficient,  for  the  same  interval,  as  given  by  (5.2): 

R(X(i),X(5))  = a'(X^^,X^5^)’-^Va'(R)''^^ 

= (5(.7505)+3}/10  - .675. 

However,  if  the  interval  is  expanded  to  include  the  endpoints,  the  non- 
parametric prediction  coefficient  does  not  change,  but  the  discreteness 
of  the  Dirichlet  process  causes  an  increase  in  the  Diridilet  coefficient  to 
{a’[X^j^,X^5jl/a'(R)}  = (5(.7505)+5}/10  = .875. 

To  illustrate  the  crucial  nature  of  the  choice  of  a(R),  si^pose 
o(R)  » 20.  Then  the  Dirichlet  prediction  coefficient  of  (48.7,51.6)  is 
(20(.7505)+3)/25  = .720.  The  limit  as  a(R)  tends  to  infinity  can  also 
be  easily  computed.  As  o(R)  increases,  greater  confidence  is  Placed  on 
the  prior  at  the  expense  of  the  initial  sample.  In  this  case  that  is  re- 
flected by  the  result  that  in  the  limit  the  prediction  coefficient  for 
(48.7,51.6)  (and  also  for  [48.7,51.61)  is  .7505.  This  value  is  of  course 
Pr(48.7  < X < 51.6),  where  X is  normal  with  mean  50  and  standard  deviation 
1.25. 

Note  that  the  nonparametric  and  Dirichlet  prediction  coefficients  also 
do  not  agree  as  o(R)  tends  to  zero  (corresponding  to  less  and  less  confidence 
in  the  prior).  In  our  example,  the  nonparametric  coefficient  for  (48.7,51.6) 
remains  2/3,  vdiereas  the  Dirichlet  coefficient  approaches  .6  as  a(R)  tends 
to  zero. 


* V 
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